Classifying informative and imaginative prose using complex networks

نویسندگان

Henrique Ferraz de Arruda

Luciano da Fontoura Costa

Diego R. Amancio

چکیده

Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, which encompasses machine translation, automatic summarization and document classification. In the latter, many approaches have emphasized the semantical content of texts, as it is the case of bag-of-word language models. This approach has certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only on a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterizing texts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational Analysis Of Predicational Structures In English

The results of a computational analysis of all predications, finite and non-finite, in a one-million-word corpus of present-day American English (the "Brown Corpus") are presented. The analysis shows the nature of the syntactic differences among the various genres of writing represented in the data base, especially between informative prose and imaginative prose. The results also demonstrate th...

متن کامل

SUC-CORE: SUC 2.0 Annotated with NP Coreference

SUC-CORE is a subset of Stockholm Umeå Corpus 2.0 and Swedish Treebank, annotated with noun phrase coreference. While most coreference annotated corpora consist of texts of similar types within related domains, SUC-CORE consists of both informative and imaginative prose and covers a wide range of literary genres and domains.

متن کامل

SUC-CORE: A Balanced Corpus Annotated with Noun Phrase Coreference

This paper describes SUC-CORE, a subset of the Stockholm Umeå Corpus and the Swedish Treebank annotated with noun phrase coreference. While most coreference annotated corpora consist of texts of similar types within related domains, SUC-CORE consists of both informative and imaginative prose and covers a wide range of literary genres and domains. This allows for exploration of coreference acros...

متن کامل

Grammatical word class variation within the British National Corpus Sampler

This paper examines the relationship between part-of-speech frequencies and text typology in the British National Corpus Sampler. Four pairwise comparisons of part-of-speech frequencies were made: written language vs. spoken language; informative writing vs. imaginative writing; conversational speech vs. ‘task-oriented’ speech; and imaginative writing vs. ‘task-oriented’ speech. The following v...

متن کامل

Comments on Nonfinite Adverbial Patterns in English Prose Fiction: A Simple Model for Analysis and Use

This study aims to present an accessible model of some frequent nonfinite adverbial types occurring in English prose fiction. As its main syntactic argument, it recognizes that these adverbials are mostly elliptical in that there are some dependent-clause markers one can assume to be implicit when supplying those elements back into the clause complex. Some comments are provided at the end on th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1507.07826 شماره

صفحات -

تاریخ انتشار 2015

Classifying informative and imaginative prose using complex networks

نویسندگان

چکیده

منابع مشابه

Computational Analysis Of Predicational Structures In English

SUC-CORE: SUC 2.0 Annotated with NP Coreference

SUC-CORE: A Balanced Corpus Annotated with Noun Phrase Coreference

Grammatical word class variation within the British National Corpus Sampler

Comments on Nonfinite Adverbial Patterns in English Prose Fiction: A Simple Model for Analysis and Use

عنوان ژورنال:

اشتراک گذاری